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Abstract 

We describe a fast solver for linear systems with reconstructable Cauchy- 
like structure, which requires 0(rn 2 ) floating point operations and 0(rn) 
memory locations, where n is the size of the matrix and r its displacement 
rank. The solver is based on the application of the generalized Schur al- 
gorithm to a suitable augmented matrix, under some assumptions on the 
knots of the Cauchy-like matrix. It includes various pivoting strategies, 
already discussed in the literature, and a new algorithm, which only re- 
quires reconstructability. We have developed a software package, written 
in Matlab and C-MEX, which provides a robust implementation of the 
above method. Our package also includes solvers for Toeplitz(+Hankel)- 
like and Vandermonde-like linear systems, as these structures can be re- 
duced to Cauchy-like by fast and stable transforms. Numerical experi- 
ments demonstrate the effectiveness of the software. 

Keywords: displacement structure, Cauchy-like, Toeplitz(+Hankel)-like, 
Vandermonde-like, generalized Schur algorithm, augmented matrix, Mat- 
lab toolbox. 

1 Introduction 

The idea to connect the solution of some linear problems to suitable augmented 
matrices has often been used in linear algebra; see, e.g., [H|6, 26 for an applica- 
tion to full rank least squares problems. More recently, it was proposed [T7] to 
solve various computational problems involving Toeplitz matrices by evaluating 
a Schur complement of a suitable augmented matrix. 

The reason for this approach lies in the fact that Toeplitz matrices, as well 
as other classes of structured matrices, can be expressed as the solution of a 
displacement equation [9, 18 , and that this property is inherited by their Schur 
complements of any order. This fact allows one to store a structured matrix of 
dimension n in 0(n) memory locations and, which is most important, to apply 
a fast implementation of the Gauss triangularization method (the generalized 
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Schur algorithm) operating directly on the displacement information associated 
to the matrix [10l US] US] ; see also [TH [20] for a review. 

In this paper we use the mathematical tools above mentioned to devise a 
fast algorithm to solve the linear system 

Cx = b, (1) 

where b £ C" and C £ C" x ™ is a Cauchy-like matrix; this means that 



Cii — 



Ml 

LA SA 



(2) 



where (fii^tpj £ C r , U,Sj £ C, i,j = l,...,n, and the asterisk denotes the 
conjugate transpose. 
Given a matrix 

A B 
C D ' 



M 



with A G c ix ™ invertible, we denote its Schur complement of order n by 

S„(M) = D — CA~ X B. 
We associate to the system (p} the augmented matrix 



A c ,b 



C b 

-In O 



(3) 



where /„ is the identity matrix of size n and O is a null matrix. Note that 
Ac,b is Cauchy-like too; see Section [2l We will apply the generalized Schur 
algorithm (GSA) to Ac,b, to compute the solution of the system (p} as the 
Schur complement 

S n (Ac,b) = C~ x h. 

A similar approach was used in [3S] to solve structured least square problems. 

It is remarkable that other classes of structured linear systems can be reduced 
to Cauchy-like systems by fast transforms (see [TO] HI]), for example the systems 
whose matrix is either Toeplitz, Hankel, or Vandermonde. So, this approach is 
quite general. 

As observed in [TO] Q3] , the advantage of applying the GSA to a Cauchy-like 
matrix is that such a structure is invariant under rows/columns exchanges, so it 
is easy to embed a pivoting strategy in the algorithm. The problem of choosing 
a pivoting technique for Cauchy matrices is addressed in [8] [11] [29] . 

Our approach requires 0(n 2 ) floating point operations and 0(n) memory 
locations. On the contrary, the approach adopted in jlOj consists of explic- 
itly computing (and storing) the LU factorization of C, hence requiring 0(n 2 ) 
locations. 

We developed a software package, called dr solve, which includes a Cauchy- 
like solver, some interface routines for other structured linear systems, and vari- 
ous conversion and auxiliary routines. The package is written in Matlab [22] for 
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the most part. The Cauchy-like solver has also been implemented in C language, 
with extensive use of the BLAS library [7] , and it has been linked to Matlab via 
the MEX (Matlab executable) interface library. The software includes a device 
to detect numerical singularity or ill-conditioning of the coefficient matrix. 

Among the other software for structured linear systems publicly available, we 
would like to mention the Toeplitz Package [3J, written in a Russian- American 
collaboration, and toms729 [12], based on a look-ahead Levinson algorithm, 
which we will use in our numerical experiments. Various computer programs for 
structured problems have been developed by the MaSe-team [3T], coordinated 
by Marc Van Barel, and by the members of our research group [3D], coordinated 
by Dario Bini. 

This paper is organized as follows: in Section [2] we recall the displacement 
equations of some classes of structured matrices, and the rules to convert them 
to Cauchy-like form. In Sections [3J and 2] we describe the algorithms for solving 
a Cauchy-like linear system, and the pivoting techniques, respectively, that we 
have implemented. In Section [5] we give some details on the software package, 
and in Section [SJ we discuss the results of a widespread numerical experimenta- 
tion. 

2 Displacement structure 

A matrix A g C nx " is said to satisfy a Sylvester displacement equation if 

EA- AF = GH*, (4) 

where G,H <E C™ xr are full rank matrices, called the generators, E,F 6 C" xn 
are the displacement matrices, and r is the displacement rank (9[[T6l[T8]. This 
representation is particularly relevant when r is significantly smaller than n. 

A Cauchy-like matrix satisfies equation (jj) with E = D t = diag(£i, . . . , t n ), 
F = D s = diag(si, . . . , s n ), and 

G =[</>! ••• 4>n]\ H ■■■ t/>„]\ (5) 

Any matrix A satisfying (0| can be converted to a Cauchy-like one if both E 
and F are diagonalizable. In fact, if E = UDtU^ 1 and F = VD s V~ l , then (gj) 
becomes 

D t C-CD s = G H* , 

where C = U~ 1 AV 1 G = U~ 1 G 1 and H =^V*H. The same transformation 
converts a linear system Ax = b to Cx = b, where b = [7 _1 b and x = Vx. 
This transformation is numerically effective for large matrices if U and V are 
unitary and the computation is fast. 

Some well-known classes of structured matrices which fall within this frame- 
work are reported in Table [TJ see [lOj [13j [14] . The corresponding displacement 
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structure 


E 


F 


u 


V 


D t 




Cauchy-like 


D t 


D s 










Toeplitz-like 


Zt 


Z-i 


Ft 




Di 


D-i 


Toeplitz+Hankel-like 


Y 


Y 1 


S 


c 


D s 




Vandermonde-like 


Dw 




In 




Bw 





Table 1: Displacement matrices for some structured classes. 



matrices are defined as follows: 



D v = diag(ui, . . . , v n ), Z^ = 



In-l 



Y& = 



5 1 
1 1 
1 '■. 



1 







1 



where \<j>\ = 1 and S = 0, 1. Their spectral factorization is analytically known, 
as 

= F+D+F} , Y = SD S S, Yi = CD C C T , 

where, setting ui = e 2 ^ 1 , q\ = 2 -1 / 2 , qn = 1, I = 2, . . . , n, and denoting by 
the minimal phase nth root of <fi, 



F, = (4 



-k( 



k,e=o 



F =diag(r fc/ ")Lo- F i' 



and 



S 

c 



^-sin-^ 

n+l n+1 



2 (2fc-l)(i-l)ir 

-0/ cos i ir — 



D 1 =diag(o; fc )^ =0 , 
= <f> 1/n ■ Di, 

D s =diag(2cos^)^ =1 , 
D c =diag(2cos^-^) fc=1 . 



The classes listed in Table [T] generalize the classical Cauchy, Toeplitz, Han- 
kel and Vandermonde matrices. The generators reported below allow one to 
immerse such structured matrices into the corresponding displacement class: 

1. Cauchy matrices C — ( t ^ s we have G = H = [1 . . . 1] T ; 

\n-i . 



2. Toeplitz matrices T = [ti-j)i j—o- 



G = 



to 

tl-n+tl 



t-i+tn-! 



H 



i„_i-i- 



tl—tl-n 
to 
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3. Toeplitz+Hankel matrices K = {ti-j + hi + j)™j^ : 



G = 



H* 



to — ti+ho 
h - t 2 +hi 



t-n + 2 



- t-n + l + h n — h n + l 




-1 



— tn-l+hn~2 — h n -3 

tn— l+hn— 1 — h n —2 
••• 

t-2 + ho ■ ■ ■ t- n +l+hn-3 

tn-l+hn+1 ■ ■ ■ t'2+h2n-2 
••• 



t-1 — £-2+/l2«-3 — hln-1 
to — t-\-\-hln-Z 



4. Vandermonde matrices W — (w™ J )J 



G 1 





h n -2 

h 
-l 



H 1 



1 



with u>™ 7^ (/), i = 1, . 



Any Hankel matrix can be trivially transformed into Toeplitz by reverting the 
order of its rows, so there is not need to treat this class separately. 

Our Matlab routines to convert each displacement structure to Cauchy-like 
are listed in Tabled] For example, t2cl and tl2cl transform a Toeplitz and 
a Toeplitz-like matrix, respectively, into a Cauchy-like matrix. A complete 
description of each routine in the package is available using the Matlab help 
command. Fast routines to apply the matrices i 7 ^, S, C, and their inverses, to 
a vector are provided in the functions f times, stimes and ctimes; see TableEl 

Remark 2.1 A Cauchy-like matrix is uniquely identified by its displacement 
and generators if ti =^ Sj, for any i, j. Conversely, if there exist indices k, I 
such that tk = se, then (p^ipe = 0, and Cm cannot be recovered by ([2]); in such 
a case the matrix C is said to be partially reconstructable. 

Remark 2.2 A Toeplitz-like matrix of size m x n has displacement structure 
with respect to any pair of displacement matrices and Z n , with £, 77 G C. The 
choice £ = 1 and r\ = c 17rS { ~^r- L has been proved in [25] to be optimal, under the 



constraints |£| = \r)\ = I, in the sense that it ensures that the minimum value 
assumed by the denominator in ([2]) is as large as possible. 

Remark 2.3 If the displacement structure of a matrix A is known, it is im- 
mediate to obtain a structured representation for its adjoint and its inverse. In 
fact, from (|4]) it follows that 

F*A* - A*E* = -HG*, 
FA" 1 - A~ X E = {-A- X G){E*A~ X \ 

so that A* has displacement matrices (F*,E*) and generators (—H,G), while 
A^ 1 has displacement matrices (F,E). Its generators (G,H) are the solutions 
of the linear systems 

AG = -G, A*H = H, 
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and can be computed by any of the algorithms described in the following sections. 



Our approach to solve a linear system characterized by any of the above 
discussed displacement structures is to preliminarily transform it to a Cauchy- 
like system, and to apply an optimized implementation of the generalized Schur 
algorithm to the augmented matrix (j3]). We briefly describe here the displace- 
ment structure of this augmented matrix. Let us consider the linear system ([T]), 
where x, b s C nxd , d > 1 in the case of multiple right hand sides. The matrix 
Ac,b associated to the system inherits a displacement structure, as the following 
equation holds 



G [D t ~ 1 I n )h 




H 
I d 



(6) 



for any 7 g C, where e = [1, . . . , 1] T e R d and G, H are given in ((SJ). This 
proves that Ac,b is a Cauchy-like matrix of displacement rank r + d. 

We assume that C is reconstructable. Even so, the diagonal entries of the 
(2, l)-block of Ac,b are nonreconstructable, and the off-diagonal ones are recon- 
structable if and only if there are no repetitions in s, i.e., Si 7^ Sj for i =/= j. The 
blocks (1, 2) and (2, 2) are reconstructable whenever 7 ^ ti, Si, i = 1, . . . , n. 

It is possible to associate the augmented matrix (j3|) to linear systems having 
any of the structures reported in Table [TJ but the original structure is preserved 
only in the Cauchy-like case. Moreover, these structures are not pivoting invari- 
ant. For these reasons, we do not consider this approach competitive with the 
one we follow, in particular for what regards stability. 



3 Solution of a Cauchy-like linear system 

Let C € C nxn be a nonsingular Cauchy-like matrix; we want to solve the lin- 
ear system (fT]) where x, b 6 C nxd . In this Section we describe the core of our 
solver, that is an implementation of the generalized Schur algorithm (GSA) for 
the augmented Cauchy-like matrix Ac.h ©, assuming that C has an LU factor- 
ization; several pivoting strategies to treat the general case will be described in 
Section [4] We start describing the GSA for computing the LU factorization of 
a rectangular Cauchy-like matrix A by recursive Schur complementation, then 
we apply such factorization to the case A — Ac,b- 

We introduce some notation which will be needed throughout this paper. Let 
v = (vi, . . . , v n ) T be a vector of size n and let A = (a^) be an m x n matrix. 
We use Matlab notation for componentwise division (./) and for subindexing 
(:), i.e., 

v./w = (vi/wi, . . .,v n /w n ) T 1 

A 2 -.7,: = (aij), i = 2,...,7, j = 1, . . . ,n, 

and define v = [V2, ■ ■ ■ , v n ) T and A — A^-.m^-.n- Moreover, in the algorithms 
a b means the usual assignment of b to the variable a. 
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3.1 GSA for Cauchy-like matrices 



The first step of Gauss algorithm applied to a matrix A 6 
factorization 

.4 



~d u*~ 




' 1 




~d u* 


£ A_ 




1 t T 

^ 1 m—l 




S 1 (A\ 



computes the 
(7) 



where Si(A) — A — ^u* is the first Schur complement of A. In Lemma 1.1 
of [10] it has been proved that, if A is a Cauchy-like matrix with generators 
G G C mxr , H G C nxr , and displacement vectors t G C"\ s G C'\ i.e., if it 
satishes the displacement equation 



D t A — AD 3 = GH* , 
then S\(A) is a Cauchy-like matrix too, satisfying the equation 

D t S 1 {A)-S 1 {A) D s = GH*. 
Here the generators G, H are defined by 



(8) 



= G — 



1 



= H 



1 



#1, 



(9) 



The GSA consists of the recursive application of the decomposition ([7]) to 
Si(A), operating on the data which define the displacement structure. At the 
fc-th iteration, the quantities d, £ and u are computed using formula ([2]); then 
the generators are updated according to ©. It's important to remark that the 
step k of GSA computes the generators of Sf-(A), since Sk(A) — Si(Sk-i(A)) , 
and that each Sk(A) is reconstructable if A is. 

If the LU factors of A are required, then the vectors and [d u*] must 
be stored at each iteration: after the p-th iteration, p(m + n — p) memory 
locations are required to store these vectors, and a partial LU factorization of 
A is computed 



A = 





Al2 




L 


A 2 i 


A 2 2_ 




A 21 U- X I m 



u 



L~ l A 12 
S P (A) 



(10) 



where An = LU GC pxp . If the LU factors are not required, it is possible to 
discard d, £ and u after the computation of G and H . This is the case we are 
interested in, since our aim is to compute S n (Ac,b)- 

The GSA for the computation of the Schur complement S p (A) is described in 
Algorithm[TJ The memory locations required by the algorithm are (m+n)(r+2). 
The computational cost is p((4r + l)a + 1) floating point operations (flops), if 
the input is real, and 2p(5(a+l)+8r(a+j)) flops, if the input is complex, where 
a = m + n — p, and assuming 2 flops for each complex sum, 6 for each product 
and 10 for each division. The algorithm outputs the displacement matrices and 
the generators of S P (A). 

Remark 3.1 Algorithm[]]requires that the Cauchy-like matrix A is reconstruct- 
able. If it is not, the algorithm must be modified in order to store the nonrecon- 
structable entries. This can be done by employing part of the generators, since 
after the step k the first k rows of both G and H become unused. 



7 



Algorithm 1 GSA for computing S p (A), assuming D t A — AD S — GH* . 
l: Require: m,n,r,p G N + s.t. p < min(m, n) 

2: Require: t G C m , s G C n , G G C mxr , _ff G C nxr s.t. i» / Sj , Vi, j 

3: H ^ H* 

4: for fe = 1 to p 

5: d <- (Gfe, : ■ H : ,k)/(t k - S fe ) 

6: if d = then STOP 

7: £ <— (Gk + l:m,: ' H- ,fc ) ./ (tk + l:m ~ Sfc) 

8: U < (Gk,: ■ H : ,k+l:n)-/{tk - Sfc+l:n) 

9: Gfc + l ;m , : <— Gk + l:m,: — £ ' (^Gfc,;) 

10: H u k + l:n <~ H-.,k + l:n — {\H:,k) ' U 

li: end for 

12: Output: t p +l ;m , S p +i :n , G p +l :mj :, 



3.2 GSA for the augmented matrix .Acb 



To compute <S n (^c,b) we apply n steps of the GSA to the matrix Ac,h', in this 
case the factorization (fTU)) reads 



" C b" 




L 




~U L^b 


.-/„ o_ 




-u- 1 I n 




S n (Ac,h) 



(11) 



and it emphasizes the fact that L and XJ~ X are computed by columns, and U by 
rows. We observe that, in this case, the reconstructability requirement in line 2 
of Algorithm [T] is not satisfied. This causes no problems when Si ^ Sj, Vz ^ j, 
since the nonreconstructable entries of Ac,h are located along the diagonal of 
its (2, l)-block, and are known to be —1. On the contrary, in the presence of 
repeated entries in the vector s, Algorithm Q] is not applicable; an algorithm to 
treat this case will be proposed in Section 13.31 

One can simultaneously compute the solution of d linear systems, by setting 
x, b G C nxd in Algorithm [TJ see ([6]). A straightforward application yields a cost 
of {8r + 8d+2)n 2 + 0(n) flops if the input is real, and (32r + 32d+20)n 2 + O(n) 
flops if the input is complex. This cost can be reduced thanks to the following 
remarks. 

Since in general the right hand side b is unstructured, it is convenient to 
store it as a matrix, rather than by means of the displacement relation ([6]). 
We adapted our algorithm in order to explicitly store the d rightmost columns 
of Ac,b, i-e., [q], and to perform "traditional" Gauss elimination on them. 
Accordingly, we used the displacement equation 



C 



C 
-In 



D s = 



G c 




Hr 



(12) 



to represent the n leftmost columns of Acb- 
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By the particular structure of Ac,b (see (fTTj) ). the step k of Algorithm [T] 
yields a vector £ £ C^n-fc wnose i as t n — k entries vanish, so that it can be 
stored in a vector of length n. A consequence is that, at step k, the GSA only 
needs to modify the rows from k + 1 to n + fc of .4c,b , and to do so only such 
rows are changed in the left generator. Given its initial form (see 0), only 
n rows of the left generator are essential, regardless the index k. To optimize 
the memory access, we reuse the first k rows of Gc to store the rows of the 
left generator ranging from n + 1 to n + k; see also Remark 13.11 The same 
procedure is applied to the vector £ and to the submatrix [ ^ ] , whose essential 
part is stored in b. 



Algorithm 2 Solver for Cx = b, assuming DtC — CD S = GH* (sj ^ Sj). 

l: Require: n, r, d 6 N + 

2: Require: t,s € C"; G,H £ C nxr ; b £ C nxd 

s.t. ti 7^ Sj, Vi,j, and 7^ Sj, Vi 7^ j 

3: H ^ H* 

4: for fc = 1 to n 

5: t^(G-H.., h )./{[ Sl : k -V,t k :n]-S k ) 

6: if = then STOP 

7: U <- (G k ,: ■ H.„ k+1[n )./(t k - sl +1:n ) 

8: d<-£k 

9: 4-5--1 
10: g^±G fc , : 
11: Gfc, : ^0 

12: G^G-£ g 

13: f <- |b*,: 

14: hk,: <— 

15: b^b-^-f 

16: H :> fc+l:ri <~ #:,fc+l:rc — {~3,H--,k) • U 

17: end for 

18: Output: x <s— b 



This leads to Algorithm [5] We observe that it approximately needs (2r + 
d + 2)n memory locations: the storage is then linear in n, while the original 
GKO algorithm [10] requires 0(n 2 ) locations. Moreover, the storage required 
by Algorithm [2] is almost minimal, since it uses only two extra n- vectors besides 
the right hand side b and the displacement data of C . The required flops are 
(6r + 2d+ §)n 2 + 0(ri) in the real case and (24r + 8d + 15)n 2 + 0(n) in the 
complex case. If we compare it to Algorithm [T] when r — 2, which is the case 
that occurs when we solve Toeplitz systems, the required flops scale to 61% if 
d = 1 and 45% if d = 4. 

The requirements ti 7^ Sj, Vi, j, and Si 7^ Sj, Vi 7^ j, guarantee that Ac,b is 
reconstructable, apart from the diagonal entries of its (2, l)-block. 
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3.3 The case of multiple s« 

As already observed in Section 13.21 if some repetitions occur in the vector s, 
then Algorithm [2] is not applicable, because the upper triangular part of the 
(2, l)-block of Ac,b has some off-diagonal entries which are not reconstructable, 
and thus the vector t cannot be computed by line 5 of Algorithm [5J It is worth 
noting that no entry of s (or of t) can be repeated more than r times, otherwise 
C would be singular. 

To overcome this difficulty there are at least two possibilities. Both are based 
on a permutation of s, which induces a change of variable in Cx = b, as shown 
in the following trivial proposition. 

Proposition 3.2 Let C E C nxn , t,s £ C n , G,H £ C nxr , b £ C nxd , and 
assume that 

D t C-CD s = GH*. 
If Q is a permutation matrix, then 

D t C-CD~ s = GH*, 

where C — CQ T , s = Qs, and H = QH . When C is nonsingular, the solution 
of Cx — h is related to the one of Cx = b via x = Qx. 

3.3.1 Redundant-injective splitting of s 

The simplest way to deal with repeated entries in s in Algorithm [2] is to choose 
Q such that s = Qs = [ - 1 ] , where §2 <E C has no repetitions and v is as 
large as possible. We also partition x = C/x = [j 1 ] and b = so that 

X2,b2 £ <C vxd . Since C is invertible, v ^ \n/r]. 

By adapting Algorithm [2] to the augmented matrix 

J v = [0 -I u ] 6 M^ x ", 

we obtain S n {B^, b ) = — J^C _1 b = X2. Then, the vector xi can be computed 
by solving the system 

Cnxn = (bi - Ci 2 x 2 ), (13) 

where the matrix [Cn C12] contains the first n — v rows of C. 

This approach is simple, but has two main disadvantages. The first one 
is that we need extra storage for Cn and C12, since the initial generators get 
overwritten during the algorithm. The second is that it is reasonable to solve the 
system (|13p by an unstructured method only if n — v is small. If this condition 
is not met, a possible strategy is to apply recursively the same technique to Cn, 
but this approach would lead, in the worst case, to r — 1 recursive steps. 

Remark 3.3 We observe that when applying our Matlab implementation of Al- 
gorithm^ to a Cauchy-like matrix with repetitions in s, only n—V components 



C b 

Jv 



10 



of the computed solution vector are affected by the overflows caused by nonrecon- 
structable entries (i.e., they are Inf or NaNj, while the remaining v components 
are correct. This is due to Matlab full implementation of IEEE floating-point 
arithmetic. 



3.3.2 Gathering of s 

A different way to deal with repeated entries in s happens to be very effective 
both from the point of view of storage and computational cost. 

Let o~j, j = 1, . . . , v, be the distinct entries of s, each with multiplicity [ij , and 
choose a permutation matrix Q such that each subset of repeated components 
is gathered, i.e., occupies contiguous entries in s = Qs. It follows that s can be 
partitioned as 



r(l) 



r(«0 



where tr^' = aj 



£ C^ 3 and Hj = n. 
i=i 



The order of the <jj is not important and Q is not unique. Since C is nonsingular, 
Mj ^r, j = l,...,v. 

Changing variable according to Proposition ^ . 2l causes the nonreconstructable 
entries in the (2, l)-block of Aq b to be grouped in the square blocks Tj, j — 
1, . . . , v, each one of size /ij, located along the diagonal of the (2, l)-block. If a 
o~j occurs only once in s (fij = 1) then Tj is a scalar. 

If we apply Algorithm [2] to the system Cx = b, the vector i computed at 
step k intersects only one nonreconstructable block, say Tj(k), and we define ctk 
to be the column index in Aq b of the first column of Tj(k) ■ Since the (2, l)-block 
of Ag b is upper triangular and has —1 along its diagonal, during the algorithm 
we need to explicitly store and update the strictly upper triangular part of each 
Tj. Moreover, at step k we need to store only the first k — + 1 rows of Tj(k), 
i.e., those ranging from n + to n + k. These rows fit in a natural way into 
H ak: k t : since Hj(k) ^ t; see also Remark 13.11 The update of Tjik) is performed 
by standard Gaussian elimination, as we do for the rightmost columns of Aq b . 

The above technique can be embedded in Algorithm^ to make it applicable 
in the presence of repeated entries in the vector s. The result is a general 
method, outlined in Algorithm^ for the solution of a Cauchy-like linear system 
whose coefficient matrix is reconstructable and nonsingular. Besides the vector 
a. = [ai,..., a n ] T , we use the auxiliary vector u = [u)\, . . . , to n ] T , where is 
the column index in Aq b of the last column of Tj(k) ■ The vectors a and uj can 
be constructed in Matlab with library functions at negligible cost; obviously, in 
the code the computation involving the permutation matrix Q is handled via a 
vector of indices. 

Whenever the vector £ intersects a block Tj, some of its entries have to 
be extracted from H; see lines 5 . 1-5 . 3 in Algorithm [3] The update of the 
nonreconstructable block Tj(k) is performed at lines 16.1-16.6. This strategy 
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Algorithm 3 Solver for Cx = b, assuming D t C — CD S = GH* (mult, knots). 



l: Require: n, r, d G N + 

2: Require: t,s G C n ; G, H G C nxr ; b G C nxd s.t. U / Sj , 

3.1: Compute: Q, a, uj (see text) 

3.2: H <- (QH)* 

3.3: S Qs 

4: for fc = 1 : n 

5.1: ^l: Qfc -l <— (Gl:a k -1,: ■ H- ; fc) . / (si: Qfc - 1 — Sfc) 

5.2: if a fc < k then £ ah :k-l «~ (H k -a k ,a k :k-l) T 

5.3: £fc ;n (Gk:n,: ' H :,k) ■ / (tk:n — Sfc) 

{ Zmes /rom to 16 of Algorithm^} 
16.1: if fc < 0;^ 

16.2: 7 <S— k — Q fe + 1 

16.3: S <— CJfe — Qfc 

16.4: i? 7 :«,fc «- 

16.5: H T .S,a k :k «- H T .S,a k :k ~ (^Ui ; „ fc _fe) T • (£a k :k) T 

16.6: end if 

17: end for 

18: Output: x <s— Q T b 



of handling repetitions in s does not increase the overall complexity with respect 
to Algorithm [2] 

We note that this approach can be numerically effective also when some en- 
tries of the displacement vector s are very close. In this case, in order to avoid 
small denominators in formula ([2]) , it may be convenient to approximate the sys- 
tem matrix by collapsing the clustered entries in s, and then apply Algorithm^ 
This possibility will be explored in Section [6] 

4 Pivoting strategies 

Algorithms [5] and [3] require that the LU factorization of C exists. Both to avoid 
this requirement, and for stability issues, it is important to introduce a pivoting 
strategy in the algorithms. Any such strategy should not increase the compu- 
tational cost and the amount of memory required, which are 0{n 2 ) and 0(n), 
respectively. This condition is met by partial pivoting, while complete pivot- 
ing would raise the computational cost to 0(n 3 ), since it requires the explicit 
computation of a full matrix at each step. In [29] and [11], the authors pro- 
pose some approximations to complete pivoting, which keep the computational 
cost quadratic. We adapted these strategies to our augmented matrix approach 
for solving a Cauchy-like linear system, keeping the storage linear. We imple- 
mented three variations of Algorithm [2] in the case there are no repetitions in 
s, to include: 
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a) partial pivoting (Algorithm 2]) ; 

b) Sweet & Brent's pivoting [25] (Algorithm [5]) ; 

c) Gu's pivoting and generator scaling technique [TT] (Algorithm [5]) . 

We also implemented partial pivoting for Algorithmic i.e., for the case when 
there are repetitions in s; see Algorithm 21 In this case, only partial pivoting 
is suitable, since it preserves the ordering of s, as it will be made clear in 
the following. For comparison issues only, we included complete pivoting in 
Algorithm [7] 

Since our goal is to compute the solution to system (JlJ as the Schur comple- 
ment of order n of the augmented matrix ([3]) , pivoting must be performed only 
on the first n rows and columns of Ac,h (or Aq b ). If we operate only on rows, 
factorization (fTTj) is replaced by 
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while if both rows and columns are permuted, it is convenient to apply the 
factorization 
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In (|15p . when columns i and j are swapped, we also exchange rows n+i and n+j 
of the augmented matrix. This unusual permutation does not affect the com- 
putation, as the GSA performs the same arithmetic operations using a different 
rows ordering, with the only difference of computing a permutation Qx of the 
system solution. The advantage of this approach is that the nonreconstructable 
entries stay along the diagonal of the (2, l)-block. 

As mentioned above, we applied only partial pivoting to Algorithm [3j as 
column pivoting induces a permutation in the right displacement vector s. So, 
in this case only factorization (|T4"f is employed, replacing C by C. 



4.1 Partial pivoting 

Partial pivoting can be included in Algorithms [5] and [3] by inserting two instruc- 
tions before line 6, as shown in Algorithm [4] 

Algorithm 4 Solver for Cx = b with partial pivoting. 
{ Insert in Algorithm^ and\B[ } 

5.4: FIND i SUCH THAT \£i\ = max \£k:n\ 

5.5: if i ^ k then swap rows k,i of G, t, £, h 
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4.2 Sweet & Brent pivoting 

In [29 , an error analysis is performed for the LU factorization computed by the 
GSA with partial pivoting, in the case of Cauchy and Toeplitz matrices. Sweet 
and Brent obtained an upper bound for the error depending on the LU factors, 
as one might expect, and on the generator growth factors. In fact they showed 
that, when the displacement rank is larger than 1, there can be a large growth 
in the generators entries, and this is reflected in the solution error. 

In order to approximate complete pivoting, which would increase the stabil- 
ity of the algorithm, Sweet and Brent proposed to choose the pivot, at step k, 
by searching both the kth column and the fcth row. The maxima 

Pi = \£h\= max \£i\ and p 2 = \m 2 \ = max \uj\, 

i—k+l.....n j—k+l....,n 

are compared with the actual pivot d; see flTJ . lid < max{pi , p 2 } , then if pi ^ p 2 
the rows k and i\ are swapped, otherwise the columns k and i 2 are swapped. 
Applying this pivoting strategy to the augmented matrix ([3]), whenever d < 
pi < p 2 we also swap the rows n + k and n + i 2 , according to factorization (fl~5j) . 
This originates Algorithm [SJ 

4.3 Gu's pivoting 

In [11] a different approach was proposed to prevent the generator growth. 
At each step a compact QR factorization of the left generator G (see ([8])) is 
computed, and the triangular factor is transferred to the right generator, i.e., 

GH* = (UR)H* = U(HR*)* = GH*, 

where R G C rxr is upper triangular, and U is a matrix with orthonormal 
columns having the same size than G. This procedure has the effect of keeping 
the 2-norm of the left generator constant across the iterations. Moreover, it is 
immediate to observe that the jth column of the right generator H* has the 
same 2-norm than the corresponding column of the product GH * . 

Gu's approximation of complete pivoting consists of selecting the column 
jmax of H* having the largest 2-norm, and swapping the columns k and j ma x 
of the system matrix before performing partial pivoting. Moreover, he observes 
that in most cases it is sufficient to apply this procedure every K steps, instead 
than at each iteration fc, and uses the value K = 10 in his numerical experiments. 

To adapt Gu's technique to our augmented matrix approach, we compute 
the QR factorization of the last n — k + 1 rows of G, and update the generators 
accordingly; see Algorithm [5] 

4.4 Complete pivoting 

To test the effectiveness of the described pivoting techniques, we implemented 
complete pivoting in Algorithm [71 The overall computational cost is 0(n 3 ), 
since the algorithm requires at step k the reconstruction of a square submatrix 
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Algorithm 5 Solver for Cx = b with Sweet and Brent's pivoting, 

l: Require: n, r, d £ N + 

2: Require: t,s e C"; G,H e C nxr ; b £ C nxd 

s.t. t, 7^ Sj, Vi, j, and Si 7^ Sj, Vi 7^ j 

3: H ^ H* 

3.9: Q 1- I n 

4: for fc = 1 to n 

5: £fc:n (Gfc :n ■ H ■ ,k) ■ / (tk:n — Sfc) 

5.1: Ufc+l :n <— (Gfc, : ■ i?:,fc+l:n)./(tfc — s fc+l:n) 

5.2: FIND ii SUCH THAT pi := | = max |£fc :n | 
5.3: FIND 12 SUCH THAT £>2 := |tt» a | = max |Ufc+i :n | 

6: if pi = p2 = then STOP 
6.1: else if p2 > Pi 

6.2: SWAP COLUMNS fc, «2 OF H, S, Q T 

6.3: U i2 <S— £ fc 

6.4: £*.„ «- (Gfc :n ■ H.. lk )./(t k :n - s k) 

6.5: else if ii > fc 

6.6: SWAP ROWS fe, il OF G, t, b 

6.7: Ufc-|_l;„ <— (Gfc, : ■ H :,k + l:n) ■ / (fk — s fc+l:n) 

6.8: end if 

7: £l:fc-l «- (Gi ;fc _i ■ H : ,fc)./(si:fc-l ~ s fc) 

{ Insert lines 8-16 from Algorithm^} 

17: end for 

18: Output: x Q T b 



of size n — k + 1 by formula ([2]). To keep storage linear with respect to n, we 
compute its columns one at a time, saving the largest entry and its position in 
the column. 

4.5 Singularity detection 

A robust linear system solver should be able to detect singularity and to warn 
the user about ill-conditioning. Matlab backslash operator, in the case of a 
square, non sparse matrix A, performs the first task by checking if a diagonal 
element of the U factor of A, computed by the dgetrf routine of LAPACK [T], 
is exactly zero. Ill-conditioning is detected by estimating the 1-norm condition 
number of the linear system by the dgecon routine of LAPACK, and a warning 
is issued if the estimate rcond is less than eps = 2~ 52 . 

In our code we check the singularity in the same way, and we detect ill- 
conditioning by computing the 1-norm condition number of the U factor, not to 
increase the overall computational load. In fact, as factorization (II 1 p shows, we 
compute the matrix U by rows (line 7 of Algorithm [2} and U^ 1 by columns (line 
5). So, it is easy to obtain the 1-norm of U , and to update step by step the 
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Algorithm 6 Solver for Cx = b with Gu's pivoting. 



l: Require: n, r, d, if 6 N + 

2: Require: t,s e C n ; G,H e C nxr ; b £ C nxd 

s.t. t, / Sj, Vi, j, and s 4 7^ Sj, Vz / j 

3: H ^ H* 

3.9: Q ^ I n 

4: for fc = 1 to n 
4.1: if mod(fc, 7f) = 1 and fc ^ n — r + 1 

4.2: [17, i?] = qr(Gfc :n ,:,0) // compact QR factorization 

4.3: G\:k-1,; *~ Gl:k-1,-.R 1 

4.4: G k :n.: <- [7 

4.5: H;,k:n ^ R ■ H-.,k:n 

4.6: FIND l 2 SUCH THAT ||#:,i 2 || 2 = max^^n ||-H":j||2 

4.7: if 12 / fc then SWAP COLUMNS fc, ii OF ii", s, Q T 

4.8: end if 
5: £■(- (G- H.. ik )./([s 1:k - 1 ;t k:n ] -s k ) 

5.4: FIND ii SUCH THAT \£i\ = max|^ fe:n | 

5.5: if i\ k then swap rows fc,ii OF G, t, £, b 
{ Insert lines 6-16 from Algorithm^} 

17: end for 

18: Output: x <s— Q T b 



sums of the columns of U. If the reciprocal of the resulting condition number 
is less than eps, a warning is displayed. 

5 Package description 

Our package is written for the most part in Matlab |22) . and it has been de- 
veloped using version 7.9 (R2009b) on Linux, but we tested it on various other 
versions, starting from 7.4. Full documentation for every function in the package 
is accessible via the Matlab help command, and the code itself is extensively 
commented. 

The package is available at the web page 

http : //bugs .unica. it/~gppe/sof t/ 

and it is distributed as a compressed archive file. By uncompressing it, the 
directory drsolve which contains the software is created. This directory must 
be added to the Matlab search path, either by the addpath command or using 
the menus available in the graphical user interface. Depending on the operating 
system and the Matlab version, some additional work may be required; the 
details of the installation arc discussed in the README.txt file, located in the 
drsolve directory. 
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Algorithm 7 Solver for Cx = b with complete pivoting, 
l: Require: n, r, d £ N + 

2: Require: t,s e C"; G,H e C nxr ; b G C nxd 

s.t. ti 7^ Sj, Vi, j, and Si 7^ Sj, Vi 7^ j 

3.9: Q 1- I n 

4: for fc = 1 to n 
z.l: for j — k to n 

4.2: Vfc :n <- (Gfe !n , : ■ H : j)./(tk:n - Sj) 

4.3: FIND r, , Cj SUCH THAT Cj := \\ r . | = max |Vfc :n | 

4.4: end for 

4.5: FIND 12 SUCH THAT Ci 2 = max(c fc: „) 
4.6: il r i2 

4.7: if ii 7^ k then swap rows fc,ii OF G, t, b 
4.8: if ii =fi k then SWAP COLUMNS fc,l2 OF J/, s, Q T 
{ Insert lines 5-16 from Algorithm^} 

17: end for 

18: Output: x Q T h 



After the installation, the user should execute the script validate .m in the 
subdirectory drsolve/validate. It will check that the installation is correct, 
that all the files work properly, and it will give some hints on how to fix instal- 
lation problems. 

The core of the package is the function clsolve, which contains the Matlab 
implementation of Algorithms [2H3 Since these algorithms are heavily based 
on for loops, which in general degrade the performance of Matlab code, we 
reimplemented this function in C language, using the Matlab C-MEX interface. 
The MEX library allows to compile a Fortran or C subroutine, which can be 
called by Matlab with the usual syntax. To optimize the C code and to take 
advantage from the computer architecture, we made an extensive use of the 
BLAS library [7], keeping at a minimum the use of explicit for loops. When the 
clsolve function is called, the compiled version is executed, if it is available, 
otherwise the Matlab version is executed, issuing a warning message. 

The C code is contained in the drsolve/src subdirectory. Its compilation 
is straightforward on Linux and Mac OS X (and we expect the same behaviour 
on other Unix based platforms), while it is a bit more involved on Matlab for 
Windows, which officially supports the MEX library only for some commercial 
compilers. Some further difficulties are due to the presence of complex vari- 
ables in the code, which are not supported by the minimal C compiler (lcc) 
distributed with Matlab for Windows, and the need to link the BLAS and LA- 
PACK l! libraries. We produced executables on Windows using a porting of 
the GNU-C compiler [28] and the "MEX configurer" Gnumex [27] ■ Executables 
for various versions of Matlab on Linux, Mac OS X, and Windows, are provided 
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structure 



solver conversion 



Cauchy-like 

Toeplitz 

Toeplitz-like 

Toeplitz+Hankel 

Toeplitz+Hankel-like 

Vandermonde 

Vandermonde-like 



vlsolve 



vsolve 



thsolve 
thlsolve 



clsolve 
tsolve 



tlsolve 



tl2cl 



th2cl 
thl2cl 



t2cl 



v2cl 



vl2cl 



Table 2: Solvers and conversion routines. 



in our package; see the README . txt file for details on the compilation process, 
and on the activation of the available executables. 

An important technical remark which affects the 64-bit operating systems is 
the following. Starting from Matlab 7.8, the type of the integer variables used 
in the BLAS and LAPACK libraries provided with Matlab changed from int 
to ptrdif f _t. This has no effect on 32-bit architectures, but on 64-bit systems 
the size of the variables changed from 4 to 8 bytes. For this reason, to compile 
our C-MEX program clsolve . c, running Matlab version 7.7 or less on a 64-bit 
system, it is necessary to uncomment one of the first lines of the program, which 
is clearly highlighted in the code. 

Besides the algorithms for solving Cauchy-like linear systems, coded in the 
function clsolve, the package includes six simple interface programs, listed in 
Table [21 to solve linear systems with the displacement structures discussed in 
Section [2J All these functions transform the system into a Cauchy like one 
by using the corresponding conversion routine (see Table [5]), call clsolve to 
compute the solution, and then recover the solution of the initial system. 

For example, given the Cauchy-like system Cx = b, with displacement 
D t C - CD S = GH* , the commands 

piv = 1; 

x = clsolve(G,H,t,s,b,piv) ; 

solve the system by Algorithm 01 i.e., with partial pivoting. The piv variable is 
used to select the solution algorithm. To solve a Toeplitz linear system Tx = b 
with Gu's pivoting, one can issue the following commands 

piv = 4; 

x = tsolve(c,r,b,piv) ; 

where c and r denote the first column and the first row of T, respectively. 

The conversion routines and the test programs rely on some auxiliary rou- 
tines, listed in Table [3] The most relevant routines perform fast matrix prod- 
ucts involving the unitary matrices F^, S, and C, introduced in Section [51 The 
routines stimes and ctimes, concerning the Toeplitz+Hankel-like structure, re- 
quire the Matlab commands dst and dct/idct, available in the PDE Toolbox 
and in the Signal Processing Toolbox, respectively. 

The subdirectory drsolve/test contains the test programs which reproduce 
the numerical experiments discussed in the next section. 
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f times 


compute F^v 


ttimes 


compute Tv 


ctimes 


compute Cv 


cltimes 


compute Cv 


stimes 


compute Sv 


cl2full 


assemble C 


nrootsl 


nth roots of e"* 







Table 3: Auxiliary routines. 



6 Numerical results 

In this Section we present a selection of numerical experiments, aimed to verify 
the effectiveness of our package, and to compare its performance with other 
methods. 

The numerical results were obtained with Matlab 7.9 (Linux 64-bit version), 
on a single processor computer (AMD Athlon 64 3200+) with 1.5 Gbyte RAM, 
running Debian GNU/Linux 5.0. Each numerical experiment can be repeated 
by running the corresponding script in the drsolve/test directory. Except 
where explicitly noted, the C-MEX version of the clsolve function was used. 
In the experiments, the right hand side of each linear system corresponds to the 
solution (1, . . . , 1) , errors are measured using the infinity norm, and execution 
times are expressed in seconds. 

disp. rank cond(j4) errors exec, time 

Vandermonde 1 5.0 • 10 3 4.3 ■ 10~ 13 /8.3 • 10~ 13 0.80/26.52 

Toeplitz 2 3.5 • 10 4 1.3 • 10~ 12 /4.6 • 10~ 12 0.87/25.12 

Toeplitz+Hankel 4 5.0 ■ 10 5 1.6 ■ 10" 7 /1.5 ■ 10" 11 2.82/22.86 

Cauchy-like 5 4.5 ■ 10 3 2.7 ■ 10~ 12 /5.5 ■ 10~ 13 3.04/22.20 



Table 4: Errors and execution times in the solution of random complex linear 
systems of size 2048, belonging to four different structured classes. The two 
figures in the rightmost columns are obtained by drsolve with partial pivoting, 
and by Matlab backslash, respectively. 

The first test concerns the solution of a complex linear system of size 2048, 
whose matrix is either Vandermonde, Toeplitz, Toeplitz+Hankel, or Cauchy- 
like; in the last case the displacement rank is 5. The linear systems were con- 
structed from random complex data; see the file testl .m for details. The errors 
and execution times, reported in Table [4] were obtained by the solvers listed 
in Table [2j with partial pivoting, and by Matlab backslash. It can be seen that 
in all cases the structured solver is much faster, as expected, and that there 
is a significant loss in accuracy only in the Toeplitz+Hankel case, where the 
condition number is larger. 

In Figure [1] the performance in terms of execution time is further investi- 
gated. In this random real Toeplitz system of size 2 , k = 8, . . . , 15, is 
solved by tsolve and thsolve with partial pivoting (setting to zero the Hankel 
part of the input matrix), and by the subroutine toms729 [12) . based on a look- 
ahead Levinson algorithm, with the pmax parameter set to 10; a Matlab MEX 
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: a T (no MEX; 

■\0~ s ' -b -backslash 10" 3 c> 



□ an 2 11 

-4 -4 

10 1 — 1 — 1 1 — 1 — 1 1 — 1 10 ' — 1 — 1 1 — 1 — 1 — 1 

8 9 10 11 12 13 14 15 8 9 10 11 12 13 14 15 

Figure 1: Execution time on a single processor computer (left) and on a quad- 
core computer (right) for a random real Toeplitz system of size 2 fe , k = 8, . . . , 15. 



gateway for this subroutine is available [2J. These methods are also compared, 
for n < 4096, to Matlab backslash, and to a modified version of tsolve (the 
corresponding data are labelled as "no MEX"), which calls the Matlab imple- 
mentation of clsolve. The computation is repeated on a 4 processors computer 
(Intel Core2 Quad Q6600), running the same operating system and Matlab ver- 
sion. It results that tsolve is about 6 times faster when the compiled version 
of clsolve is called. Moreover, toms729 is faster than our solvers, although it 
has the same order of complexity. 

Figure [2] reports the errors in the solution of the same set of real Toeplitz 
linear systems. In this case, the different pivoting strategies implemented in 
our package are compared; the results related to total pivoting were computed 
only for n < 4096. This figure shows that Sweet and Brent's pivoting (label 
"S&B") and Gu's pivoting may produce a very good approximation to total 
pivoting, leading to a significant improvement in accuracy; we verified that the 
execution time is not significantly affected by the additional load required by 
these pivoting techniques. 

In Figure[3]we compare the accuracy of three of our Toeplitz solvers, namely 
tsolve with partial and Gu's pivoting, and thsolve with partial pivoting (set- 
ting to zero the Hankel part of the matrix), to toms729 and Matlab backslash. 
These methods are applied to the same set of Toeplitz linear systems used in the 
previous experiments, and to Gaussian linear systems, whose matrix is defined 
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Figure 3: Error comparison between different solvers for a random real Toeplitz 
system (left) and a Gaussian system (right) of size 2 k , k = 8, ... ,15. 
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by 

a «=V^ e " f(w)2 ' CT=0 - 3 ' 

and whose asymptotic condition number is 6.96 • 10 6 [33] ■ The fact that Gu's 
pivoting produces very accurate results, generally comparable to Matlab back- 
slash, is confirmed. Moreover, the toms729 and thsolve functions are often 
less stable than the other methods, and there are cases in which partial pivoting 
leads to a substantial error amplification. The data displayed in Figures [TJ-[3] 
were computed by the scripts test2.m and test3.m. 




100 200 300 400 500 



Figure 4: Growth of the right generator H of the perturbed Bini-Boito example, 
when three different pivoting techniques are applied. 

It is reported in the literature (see, e.g., [53]) that the generalized Schur al- 
gorithm may cause a large growth of the generators entries, causing the error on 
the solution to be much larger than expected when standard Gaussian elimina- 
tion with partial pivoting is applied. One such example is described in [5], where 
the Sylvester matrix of two polynomials is considered. It is known that the rank 
deficiency of a Sylvester matrix equals the degree of the polynomials GCD. We 
consider a matrix of size 512, corresponding to two random polynomials with 
a common factor of degree 20, and, since any Sylvester matrix is Toeplitz-likc, 
we convert it to a Cauchy-like matrix. The generators of the matrix are then 
perturbed in order to make it nonsingular, and the resulting system is solved 
by clsolve, with either partial, Gu's or total pivoting; see test4.m. 

The largest absolute value of the right generator entries at each iteration is 
reported in Figure |U It is clear that, in this example, Gu's pivoting prevents 
generator growth as much as total pivoting, while partial pivoting produces an 
exponential growth. We remark that the norm of the solution errors correspond- 
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ing to partial, Gu's, total pivoting, and Matlab backslash, arc 1.1, 1.1 • 10 
2.5 • 1CT 6 , and 3.5 • 1CT 6 , respectively. 





partial 


Gu 


total 


backslash 


S&B displacement 
alternative displ. 


3.3 • lCT 3 
1.0 • 10" 14 


4.3 • 10~ 3 
1.3 • 10~ 14 


4.5 ■ 10~ 3 

4.6 • 10" 14 


6.5 • 10~ 5 
7.7 ■ 10" 15 


max.lGfVlG^I 
max fc |tf< fc) |/!tfl 0) | 


I. 6 

II. 3 


12.4 
1.0 


I. 6 

II. 3 





Table 5: Solution of the Sweet and Brent example, with two different choices 
of generators. The first two lines reports the errors in the solution of the linear 
system by clsolve (various pivoting) and backslash. For the first generators 
pair, the last two lines display the ratios between the maximum entry of the 
generators computed at each iteration step, and the maximum entry of the 
initial generators. 

We applied the same methods to the solution of an example proposed by 
Sweet and Brent in [29.. We consider a Cauchy-like matrix with knots on the 
unit circle, having the following generators 

d = [e e + rf] , H 1 = [e -e] , (16) 

with e = n- 1 / 2 ^, . . . , 1) T , f = n-^ 2 (-l, 1, . . . , (-1)") T , and r = 1(T 12 . These 
generators produce huge cancellations when the elements of the matrix are re- 
covered. To better investigate this example, we also considered the generators 

G 2 = -rf , H 2 = e, 

which give an equivalent representation of the matrix, preventing cancellation; 
see test5.m. From Table [5j it results that there is no growth associated to 
the generators (flBj) given by Sweet and Brent, and that the loss of accuracy is 
mainly due to the cancellation in the product G\H*. 

Algorithm [3] can be applied to a Cauchy-like matrix with multiple knots; see 
Section [3. 31 We verified that its numerical performance is not influenced by the 
knots multiplicity and that, when Sj ^ Sj, Vi =/= j, it is roughly as accurate as 
Algorithm [21 so we prefer to investigate here the particular situation of almost 
multiple knots. In the script test6.m, we construct a Cauchy-like matrix of size 
260 and displacement rank 5, with random generators. Its knots are obtained 
starting from 52 equispaced points on the unit circle, and replicating them 4 
times, each time with a perturbation producing a relative error less than r. 
Figure [5] reports the errors obtained solving the resulting linear system for t = 
10~ 8 , 10" 10 , . . . , 10~ 16 . Our clsolve function was applied with various pivoting 
techniques; the results labelled as "coll. knots" were obtained by collapsing the 
knots, i.e., removing the perturbation t, and then solving the system using 
Algorithm [U which includes partial pivoting. In this way, a different system is 
solved, since the matrix is perturbed, but when r tends to zero this approach 
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Figure 5: Error in the solution of a Cauchy-like linear system of size 260 with 
almost multiple knots. Each knot is repeated 5 times, with a different random 
perturbation producing a relative error less than r; the values of r are reported 
on the horizontal axis. 

is much more accurate than Algorithm [2] The condition number of the matrix 
is about 10 5 for any r; this fact is confirmed by the results obtained by Matlab 
backslash. 

While working on this paper, we became aware of another approach to apply 
the generalized Schur algorithm to a Cauchy-like matrix, using partial pivoting, 
and requiring 0(n) memory locations 24 . As the author kindly sent us the code 
developed for his paper, we report a comparison of his method and tsolve with 
partial pivoting. Figure [6] reports the errors and the execution times obtained 
by applying the Matlab version of both methods to the solution of random 
real Toeplitz linear systems; the results were computed by suitably modifying 
the script test2.m. It results that the algorithms are essentially equivalent. 
Performing the same test using the compiled versions of both methods produces 
the same errors, but our implementation appears to run faster. We believe that 
this is due to a different level of code optimization. 
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Figure 6: Comparison between tsolve with partial pivoting and the function 
casv from |24j . Both methods are applied to random real Toeplitz systems of 
size 2 fe , fc = 7,...,13. 
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